Overview

Dataset statistics

Number of variables40
Number of observations302
Missing cells846
Missing cells (%)7.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory422.8 KiB
Average record size in memory1.4 KiB

Variable types

Categorical27
DateTime3
Numeric8
Unsupported2

Warnings

TP_NOT has constant value "2" Constant
ID_AGRAVO has constant value "B54" Constant
NU_ANO has constant value "2013" Constant
SG_UF_NOT has constant value "33" Constant
ID_REGIONA has constant value "" Constant
SG_UF has constant value "33" Constant
ID_RG_RESI has constant value "" Constant
ID_PAIS has constant value "1" Constant
DEXAME has a high cardinality: 174 distinct values High cardinality
DTRATA has a high cardinality: 74 distinct values High cardinality
SEM_NOT is highly correlated with SEM_PRIHigh correlation
ID_MUNICIP is highly correlated with ID_MN_RESIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
ID_MN_RESI is highly correlated with ID_MUNICIPHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
CLASSI_FIN is highly correlated with COPAISINFHigh correlation
COPAISINF is highly correlated with CLASSI_FINHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
CLASSI_FIN is highly correlated with COPAISINFHigh correlation
COPAISINF is highly correlated with CLASSI_FINHigh correlation
COUFINF is highly correlated with RESULT and 8 other fieldsHigh correlation
PMM is highly correlated with AT_SINTOMA and 3 other fieldsHigh correlation
RESULT is highly correlated with COUFINF and 12 other fieldsHigh correlation
AT_SINTOMA is highly correlated with PMM and 4 other fieldsHigh correlation
ID_UNIDADE is highly correlated with DTRATA and 4 other fieldsHigh correlation
SEM_NOT is highly correlated with SEM_PRIHigh correlation
DTRATA is highly correlated with COUFINF and 15 other fieldsHigh correlation
AT_LAMINA is highly correlated with RESULT and 3 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with DTRATA and 1 other fieldsHigh correlation
ID_MUNICIP is highly correlated with ID_UNIDADE and 8 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with COUFINF and 9 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 8 other fieldsHigh correlation
CLASSI_FIN is highly correlated with COUFINF and 10 other fieldsHigh correlation
LOC_INF is highly correlated with COUFINF and 10 other fieldsHigh correlation
COPAISINF is highly correlated with RESULT and 9 other fieldsHigh correlation
DSTRAESQUE is highly correlated with COUFINF and 11 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 9 other fieldsHigh correlation
CS_GESTANT is highly correlated with TRA_ESQUEM and 1 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with RESULT and 10 other fieldsHigh correlation
SEM_PRI is highly correlated with SEM_NOTHigh correlation
AT_ATIVIDA is highly correlated with RESULT and 7 other fieldsHigh correlation
CS_SEXO is highly correlated with CS_GESTANTHigh correlation
PCRUZ is highly correlated with COUFINF and 10 other fieldsHigh correlation
ID_MN_RESI is highly correlated with DTRATA and 2 other fieldsHigh correlation
COUFINF is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
ID_REGIONA is highly correlated with COUFINF and 24 other fieldsHigh correlation
DTRATA is highly correlated with COUFINF and 16 other fieldsHigh correlation
CS_ESCOL_N is highly correlated with ID_REGIONA and 7 other fieldsHigh correlation
ID_OCUPA_N is highly correlated with ID_REGIONA and 7 other fieldsHigh correlation
DSTRAESQUE is highly correlated with ID_REGIONA and 8 other fieldsHigh correlation
ID_PAIS is highly correlated with COUFINF and 24 other fieldsHigh correlation
NU_ANO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_SEXO is highly correlated with ID_REGIONA and 8 other fieldsHigh correlation
LOC_INF is highly correlated with ID_REGIONA and 9 other fieldsHigh correlation
SG_UF is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_RACA is highly correlated with ID_REGIONA and 7 other fieldsHigh correlation
RESULT is highly correlated with ID_REGIONA and 13 other fieldsHigh correlation
AT_SINTOMA is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
SG_UF_NOT is highly correlated with COUFINF and 24 other fieldsHigh correlation
TP_NOT is highly correlated with COUFINF and 24 other fieldsHigh correlation
AT_LAMINA is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
COMUNINF is highly correlated with COUFINF and 11 other fieldsHigh correlation
TPAUTOCTO is highly correlated with COUFINF and 14 other fieldsHigh correlation
ID_AGRAVO is highly correlated with COUFINF and 24 other fieldsHigh correlation
CS_GESTANT is highly correlated with ID_REGIONA and 9 other fieldsHigh correlation
ID_RG_RESI is highly correlated with COUFINF and 24 other fieldsHigh correlation
TRA_ESQUEM is highly correlated with ID_REGIONA and 12 other fieldsHigh correlation
AT_ATIVIDA is highly correlated with ID_REGIONA and 9 other fieldsHigh correlation
CLASSI_FIN is highly correlated with ID_REGIONA and 12 other fieldsHigh correlation
PCRUZ is highly correlated with ID_REGIONA and 10 other fieldsHigh correlation
DT_NASC has 13 (4.3%) missing values Missing
DT_INVEST has 302 (100.0%) missing values Missing
PMM has 229 (75.8%) missing values Missing
DT_ENCERRA has 302 (100.0%) missing values Missing
DEXAME is uniformly distributed Uniform
DT_INVEST is an unsupported type, check if it needs cleaning or further analysis Unsupported
DT_ENCERRA is an unsupported type, check if it needs cleaning or further analysis Unsupported
COPAISINF has 232 (76.8%) zeros Zeros

Reproduction

Analysis started2021-07-06 18:42:12.924589
Analysis finished2021-07-06 18:42:33.616866
Duration20.69 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

TP_NOT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size17.2 KiB
2
302 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters302
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
2302
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2302
100.0%

Most occurring characters

ValueCountFrequency (%)
2302
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number302
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2302
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common302
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2302
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII302
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2302
100.0%

ID_AGRAVO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size22.5 KiB
B54
302 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters906
Distinct characters3
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowB54
2nd rowB54
3rd rowB54
4th rowB54
5th rowB54

Common Values

ValueCountFrequency (%)
B54302
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
b54302
100.0%

Most occurring characters

ValueCountFrequency (%)
B302
33.3%
5302
33.3%
4302
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number604
66.7%
Uppercase Letter302
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
5302
50.0%
4302
50.0%
Uppercase Letter
ValueCountFrequency (%)
B302
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common604
66.7%
Latin302
33.3%

Most frequent character per script

Common
ValueCountFrequency (%)
5302
50.0%
4302
50.0%
Latin
ValueCountFrequency (%)
B302
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII906
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
B302
33.3%
5302
33.3%
4302
33.3%
Distinct172
Distinct (%)57.0%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
Minimum2013-01-02 00:00:00
Maximum2013-12-31 00:00:00
Histogram with fixed size bins (bins=50)

SEM_NOT
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct61
Distinct (%)20.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean192716.1589
Minimum1302
Maximum201401
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB

Quantile statistics

Minimum1302
5-th percentile201301
Q1201309
median201323
Q3201338
95-th percentile201350
Maximum201401
Range200099
Interquartile range (IQR)29

Descriptive statistics

Standard deviation40658.18205
Coefficient of variation (CV)0.2109744314
Kurtosis18.60210493
Mean192716.1589
Median Absolute Deviation (MAD)15
Skewness-4.52536731
Sum58200280
Variance1653087768
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20130212
 
4.0%
20132912
 
4.0%
20131211
 
3.6%
20130611
 
3.6%
20133810
 
3.3%
20134210
 
3.3%
20135110
 
3.3%
2013509
 
3.0%
2013039
 
3.0%
2013289
 
3.0%
Other values (51)199
65.9%
ValueCountFrequency (%)
13021
 
0.3%
13162
 
0.7%
13191
 
0.3%
13282
 
0.7%
13391
 
0.3%
13453
 
1.0%
13462
 
0.7%
13471
 
0.3%
2013015
1.7%
20130212
4.0%
ValueCountFrequency (%)
2014012
 
0.7%
2013522
 
0.7%
20135110
3.3%
2013509
3.0%
2013497
2.3%
2013485
1.7%
2013471
 
0.3%
2013463
 
1.0%
2013454
 
1.3%
2013442
 
0.7%

NU_ANO
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size18.1 KiB
2013
302 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1208
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013
2nd row2013
3rd row2013
4th row2013
5th row2013

Common Values

ValueCountFrequency (%)
2013302
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2013302
100.0%

Most occurring characters

ValueCountFrequency (%)
2302
25.0%
0302
25.0%
1302
25.0%
3302
25.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1208
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2302
25.0%
0302
25.0%
1302
25.0%
3302
25.0%

Most occurring scripts

ValueCountFrequency (%)
Common1208
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2302
25.0%
0302
25.0%
1302
25.0%
3302
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1208
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2302
25.0%
0302
25.0%
1302
25.0%
3302
25.0%

SG_UF_NOT
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
33
302 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters604
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33302
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33302
100.0%

Most occurring characters

ValueCountFrequency (%)
3604
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number604
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3604
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common604
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3604
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII604
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3604
100.0%

ID_MUNICIP
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct13
Distinct (%)4.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330440.1821
Minimum330030
Maximum330630
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB

Quantile statistics

Minimum330030
5-th percentile330330.5
Q1330455
median330455
Q3330455
95-th percentile330455
Maximum330630
Range600
Interquartile range (IQR)0

Descriptive statistics

Standard deviation64.52651425
Coefficient of variation (CV)0.000195274418
Kurtosis19.15966098
Mean330440.1821
Median Absolute Deviation (MAD)0
Skewness-4.206748658
Sum99792935
Variance4163.671041
MonotonicityNot monotonic
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
330455274
90.7%
3301005
 
1.7%
3302003
 
1.0%
3304903
 
1.0%
3304203
 
1.0%
3303903
 
1.0%
3303402
 
0.7%
3303302
 
0.7%
3302502
 
0.7%
3302402
 
0.7%
Other values (3)3
 
1.0%
ValueCountFrequency (%)
3300301
 
0.3%
3301005
1.7%
3302003
1.0%
3302402
 
0.7%
3302502
 
0.7%
3302851
 
0.3%
3303302
 
0.7%
3303402
 
0.7%
3303903
1.0%
3304203
1.0%
ValueCountFrequency (%)
3306301
 
0.3%
3304903
 
1.0%
330455274
90.7%
3304203
 
1.0%
3303903
 
1.0%
3303402
 
0.7%
3303302
 
0.7%
3302851
 
0.3%
3302502
 
0.7%
3302402
 
0.7%

ID_REGIONA
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size18.1 KiB
302 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
302
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_UNIDADE
Real number (ℝ≥0)

HIGH CORRELATION

Distinct54
Distinct (%)17.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3075789.94
Minimum63
Maximum6858317
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB

Quantile statistics

Minimum63
5-th percentile2270471
Q12288338
median2288338
Q33006035.5
95-th percentile6344097
Maximum6858317
Range6858254
Interquartile range (IQR)717697.5

Descriptive statistics

Standard deviation1499096.324
Coefficient of variation (CV)0.4873857946
Kurtosis0.3570232422
Mean3075789.94
Median Absolute Deviation (MAD)267
Skewness1.275499348
Sum928888562
Variance2.247289789 × 1012
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2288338148
49.0%
547632139
 
12.9%
227047126
 
8.6%
30059928
 
2.6%
63440974
 
1.3%
22874714
 
1.3%
64878154
 
1.3%
68583173
 
1.0%
67534693
 
1.0%
22886053
 
1.0%
Other values (44)60
19.9%
ValueCountFrequency (%)
631
 
0.3%
125051
 
0.3%
125801
 
0.3%
251351
 
0.3%
22697833
 
1.0%
22698051
 
0.3%
22699881
 
0.3%
22702691
 
0.3%
227047126
8.6%
22705601
 
0.3%
ValueCountFrequency (%)
68583173
1.0%
67534693
1.0%
66943301
 
0.3%
66815731
 
0.3%
66299891
 
0.3%
66293851
 
0.3%
64878154
1.3%
63440974
1.3%
61766663
1.0%
60388911
 
0.3%
Distinct197
Distinct (%)65.2%
Missing0
Missing (%)0.0%
Memory size2.5 KiB
Minimum2012-12-21 00:00:00
Maximum2013-12-30 00:00:00
Histogram with fixed size bins (bins=50)

SEM_PRI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct63
Distinct (%)20.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean192714.1126
Minimum1302
Maximum201401
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB

Quantile statistics

Minimum1302
5-th percentile201252
Q1201307
median201322.5
Q3201336.75
95-th percentile201350
Maximum201401
Range200099
Interquartile range (IQR)29.75

Descriptive statistics

Standard deviation40658.00942
Coefficient of variation (CV)0.2109757758
Kurtosis18.60210368
Mean192714.1126
Median Absolute Deviation (MAD)14.5
Skewness-4.525367088
Sum58199662
Variance1653073730
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
20130215
 
5.0%
20132913
 
4.3%
20130511
 
3.6%
20134211
 
3.6%
20133010
 
3.3%
20131310
 
3.3%
20131110
 
3.3%
2013019
 
3.0%
2013519
 
3.0%
2013498
 
2.6%
Other values (53)196
64.9%
ValueCountFrequency (%)
13021
 
0.3%
13163
1.0%
13171
 
0.3%
13271
 
0.3%
13391
 
0.3%
13454
1.3%
13461
 
0.3%
13471
 
0.3%
2012511
 
0.3%
2012523
1.0%
ValueCountFrequency (%)
2014011
 
0.3%
2013522
 
0.7%
2013519
3.0%
2013506
2.0%
2013498
2.6%
2013484
1.3%
2013472
 
0.7%
2013461
 
0.3%
2013455
1.7%
2013442
 
0.7%

DT_NASC
Date

MISSING

Distinct269
Distinct (%)93.1%
Missing13
Missing (%)4.3%
Memory size2.5 KiB
Minimum1934-06-15 00:00:00
Maximum2013-02-02 00:00:00
Histogram with fixed size bins (bins=50)

NU_IDADE_N
Real number (ℝ≥0)

Distinct69
Distinct (%)22.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4028
Minimum2000
Maximum4079
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB

Quantile statistics

Minimum2000
5-th percentile4016
Q14027
median4036
Q34049
95-th percentile4064
Maximum4079
Range2079
Interquartile range (IQR)22

Descriptive statistics

Standard deviation132.2550893
Coefficient of variation (CV)0.03283393477
Kurtosis196.6217203
Mean4028
Median Absolute Deviation (MAD)11
Skewness-13.61598119
Sum1216456
Variance17491.40864
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
403117
 
5.6%
403413
 
4.3%
402712
 
4.0%
402610
 
3.3%
404810
 
3.3%
40399
 
3.0%
40479
 
3.0%
40529
 
3.0%
40289
 
3.0%
40309
 
3.0%
Other values (59)195
64.6%
ValueCountFrequency (%)
20001
0.3%
30021
0.3%
40012
0.7%
40022
0.7%
40032
0.7%
40051
0.3%
40101
0.3%
40111
0.3%
40121
0.3%
40131
0.3%
ValueCountFrequency (%)
40791
 
0.3%
40781
 
0.3%
40761
 
0.3%
40723
1.0%
40701
 
0.3%
40692
0.7%
40671
 
0.3%
40661
 
0.3%
40652
0.7%
40644
1.3%

CS_SEXO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
M
191 
F
111 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters302
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowF
4th rowM
5th rowM

Common Values

ValueCountFrequency (%)
M191
63.2%
F111
36.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
m191
63.2%
f111
36.8%

Most occurring characters

ValueCountFrequency (%)
M191
63.2%
F111
36.8%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter302
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
M191
63.2%
F111
36.8%

Most occurring scripts

ValueCountFrequency (%)
Latin302
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
M191
63.2%
F111
36.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII302
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
M191
63.2%
F111
36.8%

CS_GESTANT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)1.7%
Missing0
Missing (%)0.0%
Memory size17.2 KiB
6
196 
5
91 
9
 
12
2
 
2
1
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters302
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row6
2nd row5
3rd row5
4th row6
5th row6

Common Values

ValueCountFrequency (%)
6196
64.9%
591
30.1%
912
 
4.0%
22
 
0.7%
11
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
6196
64.9%
591
30.1%
912
 
4.0%
22
 
0.7%
11
 
0.3%

Most occurring characters

ValueCountFrequency (%)
6196
64.9%
591
30.1%
912
 
4.0%
22
 
0.7%
11
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number302
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
6196
64.9%
591
30.1%
912
 
4.0%
22
 
0.7%
11
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common302
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
6196
64.9%
591
30.1%
912
 
4.0%
22
 
0.7%
11
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII302
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
6196
64.9%
591
30.1%
912
 
4.0%
22
 
0.7%
11
 
0.3%

CS_RACA
Categorical

HIGH CORRELATION

Distinct7
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
1
177 
9
50 
2
36 
4
28 
 
8
Other values (2)
 
3

Length

Max length1
Median length1
Mean length0.9735099338
Min length0

Characters and Unicode

Total characters294
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.3%

Sample

1st row9
2nd row9
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1177
58.6%
950
 
16.6%
236
 
11.9%
428
 
9.3%
8
 
2.6%
32
 
0.7%
51
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1177
60.2%
950
 
17.0%
236
 
12.2%
428
 
9.5%
32
 
0.7%
51
 
0.3%

Most occurring characters

ValueCountFrequency (%)
1177
60.2%
950
 
17.0%
236
 
12.2%
428
 
9.5%
32
 
0.7%
51
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number294
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1177
60.2%
950
 
17.0%
236
 
12.2%
428
 
9.5%
32
 
0.7%
51
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common294
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1177
60.2%
950
 
17.0%
236
 
12.2%
428
 
9.5%
32
 
0.7%
51
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII294
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1177
60.2%
950
 
17.0%
236
 
12.2%
428
 
9.5%
32
 
0.7%
51
 
0.3%

CS_ESCOL_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct11
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Memory size17.6 KiB
08
99 
09
76 
06
40 
29 
07
18 
Other values (6)
40 

Length

Max length2
Median length2
Mean length1.80794702
Min length0

Characters and Unicode

Total characters546
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row09
2nd row09
3rd row06
4th row09
5th row08

Common Values

ValueCountFrequency (%)
0899
32.8%
0976
25.2%
0640
13.2%
29
 
9.6%
0718
 
6.0%
0511
 
3.6%
109
 
3.0%
037
 
2.3%
046
 
2.0%
024
 
1.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
0899
36.3%
0976
27.8%
0640
14.7%
0718
 
6.6%
0511
 
4.0%
109
 
3.3%
037
 
2.6%
046
 
2.2%
024
 
1.5%
013
 
1.1%

Most occurring characters

ValueCountFrequency (%)
0273
50.0%
899
 
18.1%
976
 
13.9%
640
 
7.3%
718
 
3.3%
112
 
2.2%
511
 
2.0%
37
 
1.3%
46
 
1.1%
24
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number546
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0273
50.0%
899
 
18.1%
976
 
13.9%
640
 
7.3%
718
 
3.3%
112
 
2.2%
511
 
2.0%
37
 
1.3%
46
 
1.1%
24
 
0.7%

Most occurring scripts

ValueCountFrequency (%)
Common546
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0273
50.0%
899
 
18.1%
976
 
13.9%
640
 
7.3%
718
 
3.3%
112
 
2.2%
511
 
2.0%
37
 
1.3%
46
 
1.1%
24
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII546
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0273
50.0%
899
 
18.1%
976
 
13.9%
640
 
7.3%
718
 
3.3%
112
 
2.2%
511
 
2.0%
37
 
1.3%
46
 
1.1%
24
 
0.7%

SG_UF
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size17.5 KiB
33
302 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters604
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row33
2nd row33
3rd row33
4th row33
5th row33

Common Values

ValueCountFrequency (%)
33302
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
33302
100.0%

Most occurring characters

ValueCountFrequency (%)
3604
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number604
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
3604
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common604
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
3604
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII604
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
3604
100.0%

ID_MN_RESI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct24
Distinct (%)7.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean330399.8775
Minimum330030
Maximum330630
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB

Quantile statistics

Minimum330030
5-th percentile330170
Q1330350
median330455
Q3330455
95-th percentile330455
Maximum330630
Range600
Interquartile range (IQR)105

Descriptive statistics

Standard deviation109.0850268
Coefficient of variation (CV)0.0003301606153
Kurtosis2.099342439
Mean330399.8775
Median Absolute Deviation (MAD)0
Skewness-1.707888211
Sum99780763
Variance11899.54308
MonotonicityNot monotonic
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
330455202
66.9%
33033016
 
5.3%
33017015
 
5.0%
33035010
 
3.3%
3304909
 
3.0%
3302506
 
2.0%
3302406
 
2.0%
3301005
 
1.7%
3302004
 
1.3%
3304204
 
1.3%
Other values (14)25
 
8.3%
ValueCountFrequency (%)
3300303
 
1.0%
3300401
 
0.3%
3300452
 
0.7%
3301005
 
1.7%
33017015
5.0%
3301851
 
0.3%
3301902
 
0.7%
3302004
 
1.3%
3302406
 
2.0%
3302451
 
0.3%
ValueCountFrequency (%)
3306301
 
0.3%
3305801
 
0.3%
3305102
 
0.7%
3304909
 
3.0%
330455202
66.9%
3304204
 
1.3%
3304142
 
0.7%
3303903
 
1.0%
33035010
 
3.3%
3303404
 
1.3%

ID_RG_RESI
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size18.1 KiB
302 

Length

Max length0
Median length0
Mean length0
Min length0

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
302
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
No values found.

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

ID_PAIS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size17.2 KiB
1
302 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters302
Distinct characters1
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1302
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1302
100.0%

Most occurring characters

ValueCountFrequency (%)
1302
100.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number302
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1302
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common302
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1302
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII302
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1302
100.0%

DT_INVEST
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing302
Missing (%)100.0%
Memory size2.5 KiB

ID_OCUPA_N
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct43
Distinct (%)14.2%
Missing0
Missing (%)0.0%
Memory size18.3 KiB
228 
999991
 
16
241005
 
6
998999
 
6
999992
 
4
Other values (38)
42 

Length

Max length6
Median length0
Mean length1.470198675
Min length0

Characters and Unicode

Total characters444
Distinct characters10
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique35 ?
Unique (%)11.6%

Sample

1st row521110
2nd row
3rd row
4th row
5th row262105

Common Values

ValueCountFrequency (%)
228
75.5%
99999116
 
5.3%
2410056
 
2.0%
9989996
 
2.0%
9999924
 
1.3%
9999933
 
1.0%
2233052
 
0.7%
2521052
 
0.7%
7243151
 
0.3%
2525451
 
0.3%
Other values (33)33
 
10.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
99999116
21.6%
2410056
 
8.1%
9989996
 
8.1%
9999924
 
5.4%
9999933
 
4.1%
2233052
 
2.7%
2521052
 
2.7%
3131051
 
1.4%
2123151
 
1.4%
2525451
 
1.4%
Other values (32)32
43.2%

Most occurring characters

ValueCountFrequency (%)
9156
35.1%
168
15.3%
262
 
14.0%
553
 
11.9%
045
 
10.1%
322
 
5.0%
420
 
4.5%
87
 
1.6%
66
 
1.4%
75
 
1.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number444
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
9156
35.1%
168
15.3%
262
 
14.0%
553
 
11.9%
045
 
10.1%
322
 
5.0%
420
 
4.5%
87
 
1.6%
66
 
1.4%
75
 
1.1%

Most occurring scripts

ValueCountFrequency (%)
Common444
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
9156
35.1%
168
15.3%
262
 
14.0%
553
 
11.9%
045
 
10.1%
322
 
5.0%
420
 
4.5%
87
 
1.6%
66
 
1.4%
75
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII444
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
9156
35.1%
168
15.3%
262
 
14.0%
553
 
11.9%
045
 
10.1%
322
 
5.0%
420
 
4.5%
87
 
1.6%
66
 
1.4%
75
 
1.1%

CLASSI_FIN
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size17.2 KiB
2
212 
1
87 
8
 
3

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters302
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row2
4th row2
5th row1

Common Values

ValueCountFrequency (%)
2212
70.2%
187
28.8%
83
 
1.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2212
70.2%
187
28.8%
83
 
1.0%

Most occurring characters

ValueCountFrequency (%)
2212
70.2%
187
28.8%
83
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number302
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2212
70.2%
187
28.8%
83
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Common302
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2212
70.2%
187
28.8%
83
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII302
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2212
70.2%
187
28.8%
83
 
1.0%

AT_ATIVIDA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Memory size17.8 KiB
10
143 
11
65 
99
54 
4
19 
1
 
8
Other values (7)
 
13

Length

Max length2
Median length2
Mean length1.867549669
Min length0

Characters and Unicode

Total characters564
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique3 ?
Unique (%)1.0%

Sample

1st row10
2nd row99
3rd row10
4th row10
5th row10

Common Values

ValueCountFrequency (%)
10143
47.4%
1165
21.5%
9954
 
17.9%
419
 
6.3%
18
 
2.6%
94
 
1.3%
122
 
0.7%
32
 
0.7%
2
 
0.7%
21
 
0.3%
Other values (2)2
 
0.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
10143
47.7%
1165
21.7%
9954
 
18.0%
419
 
6.3%
18
 
2.7%
94
 
1.3%
122
 
0.7%
32
 
0.7%
21
 
0.3%
61
 
0.3%

Most occurring characters

ValueCountFrequency (%)
1283
50.2%
0143
25.4%
9112
 
19.9%
419
 
3.4%
23
 
0.5%
32
 
0.4%
71
 
0.2%
61
 
0.2%

Most occurring categories

ValueCountFrequency (%)
Decimal Number564
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1283
50.2%
0143
25.4%
9112
 
19.9%
419
 
3.4%
23
 
0.5%
32
 
0.4%
71
 
0.2%
61
 
0.2%

Most occurring scripts

ValueCountFrequency (%)
Common564
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1283
50.2%
0143
25.4%
9112
 
19.9%
419
 
3.4%
23
 
0.5%
32
 
0.4%
71
 
0.2%
61
 
0.2%

Most occurring blocks

ValueCountFrequency (%)
ASCII564
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1283
50.2%
0143
25.4%
9112
 
19.9%
419
 
3.4%
23
 
0.5%
32
 
0.4%
71
 
0.2%
61
 
0.2%

AT_LAMINA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
1
154 
2
133 
3
 
13
 
2

Length

Max length1
Median length1
Mean length0.9933774834
Min length0

Characters and Unicode

Total characters300
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row1
3rd row2
4th row2
5th row2

Common Values

ValueCountFrequency (%)
1154
51.0%
2133
44.0%
313
 
4.3%
2
 
0.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1154
51.3%
2133
44.3%
313
 
4.3%

Most occurring characters

ValueCountFrequency (%)
1154
51.3%
2133
44.3%
313
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number300
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1154
51.3%
2133
44.3%
313
 
4.3%

Most occurring scripts

ValueCountFrequency (%)
Common300
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1154
51.3%
2133
44.3%
313
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII300
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1154
51.3%
2133
44.3%
313
 
4.3%

AT_SINTOMA
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
1
282 
2
 
18
 
2

Length

Max length1
Median length1
Mean length0.9933774834
Min length0

Characters and Unicode

Total characters300
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row2
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1282
93.4%
218
 
6.0%
2
 
0.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1282
94.0%
218
 
6.0%

Most occurring characters

ValueCountFrequency (%)
1282
94.0%
218
 
6.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number300
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1282
94.0%
218
 
6.0%

Most occurring scripts

ValueCountFrequency (%)
Common300
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1282
94.0%
218
 
6.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII300
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1282
94.0%
218
 
6.0%

TPAUTOCTO
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Memory size18.5 KiB
215 
2
66 
3
 
17
1
 
4

Length

Max length1
Median length0
Mean length0.2880794702
Min length0

Characters and Unicode

Total characters87
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row
3rd row
4th row
5th row2

Common Values

ValueCountFrequency (%)
215
71.2%
266
 
21.9%
317
 
5.6%
14
 
1.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
266
75.9%
317
 
19.5%
14
 
4.6%

Most occurring characters

ValueCountFrequency (%)
266
75.9%
317
 
19.5%
14
 
4.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number87
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
266
75.9%
317
 
19.5%
14
 
4.6%

Most occurring scripts

ValueCountFrequency (%)
Common87
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
266
75.9%
317
 
19.5%
14
 
4.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII87
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
266
75.9%
317
 
19.5%
14
 
4.6%

COUFINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct9
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Memory size18.5 KiB
267 
AM
 
13
RJ
 
8
RO
 
4
RR
 
3
Other values (4)
 
7

Length

Max length2
Median length0
Mean length0.2317880795
Min length0

Characters and Unicode

Total characters70
Distinct characters8
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row
2nd row
3rd row
4th row
5th rowRO

Common Values

ValueCountFrequency (%)
267
88.4%
AM13
 
4.3%
RJ8
 
2.6%
RO4
 
1.3%
RR3
 
1.0%
TO3
 
1.0%
PA2
 
0.7%
AC1
 
0.3%
MA1
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
am13
37.1%
rj8
22.9%
ro4
 
11.4%
rr3
 
8.6%
to3
 
8.6%
pa2
 
5.7%
ac1
 
2.9%
ma1
 
2.9%

Most occurring characters

ValueCountFrequency (%)
R18
25.7%
A17
24.3%
M14
20.0%
J8
11.4%
O7
 
10.0%
T3
 
4.3%
P2
 
2.9%
C1
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter70
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
R18
25.7%
A17
24.3%
M14
20.0%
J8
11.4%
O7
 
10.0%
T3
 
4.3%
P2
 
2.9%
C1
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Latin70
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
R18
25.7%
A17
24.3%
M14
20.0%
J8
11.4%
O7
 
10.0%
T3
 
4.3%
P2
 
2.9%
C1
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII70
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
R18
25.7%
A17
24.3%
M14
20.0%
J8
11.4%
O7
 
10.0%
T3
 
4.3%
P2
 
2.9%
C1
 
1.4%

COPAISINF
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
ZEROS

Distinct9
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.599337748
Minimum0
Maximum188
Zeros232
Zeros (%)76.8%
Negative0
Negative (%)0.0%
Memory size2.5 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile31
Maximum188
Range188
Interquartile range (IQR)0

Descriptive statistics

Standard deviation24.82781367
Coefficient of variation (CV)3.762167451
Kurtosis30.66351286
Mean6.599337748
Median Absolute Deviation (MAD)0
Skewness5.325838361
Sum1993
Variance616.4203318
MonotonicityNot monotonic
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
0232
76.8%
135
 
11.6%
3123
 
7.6%
224
 
1.3%
1772
 
0.7%
1402
 
0.7%
1132
 
0.7%
1881
 
0.3%
1091
 
0.3%
ValueCountFrequency (%)
0232
76.8%
135
 
11.6%
224
 
1.3%
3123
 
7.6%
1091
 
0.3%
1132
 
0.7%
1402
 
0.7%
1772
 
0.7%
1881
 
0.3%
ValueCountFrequency (%)
1881
 
0.3%
1772
 
0.7%
1402
 
0.7%
1132
 
0.7%
1091
 
0.3%
3123
 
7.6%
224
 
1.3%
135
 
11.6%
0232
76.8%

COMUNINF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct23
Distinct (%)7.6%
Missing0
Missing (%)0.0%
Memory size18.2 KiB
267 
130260
 
7
330340
 
3
172100
 
3
130020
 
2
Other values (18)
 
20

Length

Max length6
Median length0
Mean length0.6953642384
Min length0

Characters and Unicode

Total characters210
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)5.3%

Sample

1st row
2nd row
3rd row
4th row
5th row110020

Common Values

ValueCountFrequency (%)
267
88.4%
1302607
 
2.3%
3303403
 
1.0%
1721003
 
1.0%
1300202
 
0.7%
1100202
 
0.7%
3302402
 
0.7%
1303801
 
0.3%
1400051
 
0.3%
1300061
 
0.3%
Other values (13)13
 
4.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
1302607
20.0%
1721003
 
8.6%
3303403
 
8.6%
3302402
 
5.7%
1300202
 
5.7%
1100202
 
5.7%
1400101
 
2.9%
1300061
 
2.9%
1505531
 
2.9%
2100301
 
2.9%
Other values (12)12
34.3%

Most occurring characters

ValueCountFrequency (%)
080
38.1%
139
18.6%
337
17.6%
221
 
10.0%
410
 
4.8%
58
 
3.8%
68
 
3.8%
74
 
1.9%
83
 
1.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number210
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
080
38.1%
139
18.6%
337
17.6%
221
 
10.0%
410
 
4.8%
58
 
3.8%
68
 
3.8%
74
 
1.9%
83
 
1.4%

Most occurring scripts

ValueCountFrequency (%)
Common210
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
080
38.1%
139
18.6%
337
17.6%
221
 
10.0%
410
 
4.8%
58
 
3.8%
68
 
3.8%
74
 
1.9%
83
 
1.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII210
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
080
38.1%
139
18.6%
337
17.6%
221
 
10.0%
410
 
4.8%
58
 
3.8%
68
 
3.8%
74
 
1.9%
83
 
1.4%

LOC_INF
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct19
Distinct (%)6.3%
Missing0
Missing (%)0.0%
Memory size18.1 KiB
278 
LUAN
 
6
AMAZ
 
2
ARIQ
 
1
MANA
 
1
Other values (14)
 
14

Length

Max length4
Median length0
Mean length0.3112582781
Min length0

Characters and Unicode

Total characters94
Distinct characters19
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique16 ?
Unique (%)5.3%

Sample

1st row
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
278
92.1%
LUAN6
 
2.0%
AMAZ2
 
0.7%
ARIQ1
 
0.3%
MANA1
 
0.3%
BOA1
 
0.3%
LUMI1
 
0.3%
PORT1
 
0.3%
ANGO1
 
0.3%
MOCA1
 
0.3%
Other values (9)9
 
3.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
luan6
25.0%
amaz2
 
8.3%
sint1
 
4.2%
vale1
 
4.2%
moca1
 
4.2%
togo1
 
4.2%
ariq1
 
4.2%
mong1
 
4.2%
sao1
 
4.2%
port1
 
4.2%
Other values (8)8
33.3%

Most occurring characters

ValueCountFrequency (%)
A20
21.3%
N14
14.9%
O10
10.6%
L8
 
8.5%
U7
 
7.4%
M6
 
6.4%
T5
 
5.3%
S4
 
4.3%
G3
 
3.2%
I3
 
3.2%
Other values (9)14
14.9%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter94
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A20
21.3%
N14
14.9%
O10
10.6%
L8
 
8.5%
U7
 
7.4%
M6
 
6.4%
T5
 
5.3%
S4
 
4.3%
G3
 
3.2%
I3
 
3.2%
Other values (9)14
14.9%

Most occurring scripts

ValueCountFrequency (%)
Latin94
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
A20
21.3%
N14
14.9%
O10
10.6%
L8
 
8.5%
U7
 
7.4%
M6
 
6.4%
T5
 
5.3%
S4
 
4.3%
G3
 
3.2%
I3
 
3.2%
Other values (9)14
14.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII94
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A20
21.3%
N14
14.9%
O10
10.6%
L8
 
8.5%
U7
 
7.4%
M6
 
6.4%
T5
 
5.3%
S4
 
4.3%
G3
 
3.2%
I3
 
3.2%
Other values (9)14
14.9%

DEXAME
Categorical

HIGH CARDINALITY
UNIFORM

Distinct174
Distinct (%)57.6%
Missing0
Missing (%)0.0%
Memory size19.9 KiB
2013-02-08
 
6
2013-07-18
 
6
2013-01-18
 
5
2013-12-13
 
5
2013-12-16
 
4
Other values (169)
276 

Length

Max length10
Median length10
Mean length9.960264901
Min length4

Characters and Unicode

Total characters3008
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique97 ?
Unique (%)32.1%

Sample

1st row2013-01-02
2nd row2013-01-02
3rd row2013-01-03
4th row2013-01-03
5th row2013-01-04

Common Values

ValueCountFrequency (%)
2013-02-086
 
2.0%
2013-07-186
 
2.0%
2013-01-185
 
1.7%
2013-12-135
 
1.7%
2013-12-164
 
1.3%
2013-09-174
 
1.3%
2013-09-194
 
1.3%
2013-04-164
 
1.3%
2013-10-084
 
1.3%
2013-07-234
 
1.3%
Other values (164)256
84.8%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
2013-02-086
 
2.0%
2013-07-186
 
2.0%
2013-12-135
 
1.7%
2013-01-185
 
1.7%
2013-09-194
 
1.3%
2013-10-084
 
1.3%
2013-07-234
 
1.3%
2013-04-164
 
1.3%
2013-12-164
 
1.3%
2013-07-124
 
1.3%
Other values (164)256
84.8%

Most occurring characters

ValueCountFrequency (%)
0662
22.0%
-600
19.9%
1589
19.6%
2457
15.2%
3372
12.4%
866
 
2.2%
957
 
1.9%
754
 
1.8%
551
 
1.7%
649
 
1.6%
Other values (5)51
 
1.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2400
79.8%
Dash Punctuation600
 
19.9%
Lowercase Letter6
 
0.2%
Uppercase Letter2
 
0.1%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0662
27.6%
1589
24.5%
2457
19.0%
3372
15.5%
866
 
2.8%
957
 
2.4%
754
 
2.2%
551
 
2.1%
649
 
2.0%
443
 
1.8%
Lowercase Letter
ValueCountFrequency (%)
o2
33.3%
n2
33.3%
e2
33.3%
Dash Punctuation
ValueCountFrequency (%)
-600
100.0%
Uppercase Letter
ValueCountFrequency (%)
N2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common3000
99.7%
Latin8
 
0.3%

Most frequent character per script

Common
ValueCountFrequency (%)
0662
22.1%
-600
20.0%
1589
19.6%
2457
15.2%
3372
12.4%
866
 
2.2%
957
 
1.9%
754
 
1.8%
551
 
1.7%
649
 
1.6%
Latin
ValueCountFrequency (%)
N2
25.0%
o2
25.0%
n2
25.0%
e2
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII3008
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0662
22.0%
-600
19.9%
1589
19.6%
2457
15.2%
3372
12.4%
866
 
2.2%
957
 
1.9%
754
 
1.8%
551
 
1.7%
649
 
1.6%
Other values (5)51
 
1.7%

RESULT
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct7
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Memory size19.6 KiB
1
210 
4
44 
2
42 
10
 
2
 
2
Other values (2)
 
2

Length

Max length2
Median length1
Mean length1
Min length0

Characters and Unicode

Total characters302
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)0.7%

Sample

1st row2
2nd row1
3rd row1
4th row1
5th row4

Common Values

ValueCountFrequency (%)
1210
69.5%
444
 
14.6%
242
 
13.9%
102
 
0.7%
2
 
0.7%
61
 
0.3%
31
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1210
70.0%
444
 
14.7%
242
 
14.0%
102
 
0.7%
61
 
0.3%
31
 
0.3%

Most occurring characters

ValueCountFrequency (%)
1212
70.2%
444
 
14.6%
242
 
13.9%
02
 
0.7%
31
 
0.3%
61
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number302
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1212
70.2%
444
 
14.6%
242
 
13.9%
02
 
0.7%
31
 
0.3%
61
 
0.3%

Most occurring scripts

ValueCountFrequency (%)
Common302
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1212
70.2%
444
 
14.6%
242
 
13.9%
02
 
0.7%
31
 
0.3%
61
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII302
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1212
70.2%
444
 
14.6%
242
 
13.9%
02
 
0.7%
31
 
0.3%
61
 
0.3%

PMM
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct45
Distinct (%)61.6%
Missing229
Missing (%)75.8%
Infinite0
Infinite (%)0.0%
Mean11159.78082
Minimum1
Maximum100000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size2.5 KiB

Quantile statistics

Minimum1
5-th percentile24.4
Q1280
median450
Q33800
95-th percentile100000
Maximum100000
Range99999
Interquartile range (IQR)3520

Descriptive statistics

Standard deviation27080.1356
Coefficient of variation (CV)2.426583105
Kurtosis6.192041416
Mean11159.78082
Median Absolute Deviation (MAD)314
Skewness2.738515746
Sum814664
Variance733333744.2
MonotonicityNot monotonic
Histogram with fixed size bins (bins=45)
ValueCountFrequency (%)
3807
 
2.3%
2806
 
2.0%
1000005
 
1.7%
14
 
1.3%
3903
 
1.0%
2102
 
0.7%
4502
 
0.7%
602
 
0.7%
5922
 
0.7%
15602
 
0.7%
Other values (35)38
 
12.6%
(Missing)229
75.8%
ValueCountFrequency (%)
14
1.3%
401
 
0.3%
602
 
0.7%
1221
 
0.3%
1361
 
0.3%
1441
 
0.3%
2001
 
0.3%
2102
 
0.7%
2531
 
0.3%
2806
2.0%
ValueCountFrequency (%)
1000005
1.7%
668802
 
0.7%
390001
 
0.3%
202401
 
0.3%
164001
 
0.3%
121201
 
0.3%
117201
 
0.3%
105001
 
0.3%
101001
 
0.3%
90401
 
0.3%

PCRUZ
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct6
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Memory size18.6 KiB
212 
4
25 
3
23 
2
 
16
5
 
14

Length

Max length1
Median length0
Mean length0.298013245
Min length0

Characters and Unicode

Total characters90
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row5
2nd row
3rd row
4th row
5th row3

Common Values

ValueCountFrequency (%)
212
70.2%
425
 
8.3%
323
 
7.6%
216
 
5.3%
514
 
4.6%
112
 
4.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
425
27.8%
323
25.6%
216
17.8%
514
15.6%
112
13.3%

Most occurring characters

ValueCountFrequency (%)
425
27.8%
323
25.6%
216
17.8%
514
15.6%
112
13.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number90
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
425
27.8%
323
25.6%
216
17.8%
514
15.6%
112
13.3%

Most occurring scripts

ValueCountFrequency (%)
Common90
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
425
27.8%
323
25.6%
216
17.8%
514
15.6%
112
13.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII90
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
425
27.8%
323
25.6%
216
17.8%
514
15.6%
112
13.3%

TRA_ESQUEM
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct10
Distinct (%)3.3%
Missing0
Missing (%)0.0%
Memory size18.7 KiB
221 
1
39 
99
34 
12
 
2
11
 
1
Other values (5)
 
5

Length

Max length2
Median length0
Mean length0.3940397351
Min length0

Characters and Unicode

Total characters119
Distinct characters6
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)2.0%

Sample

1st row99
2nd row
3rd row
4th row
5th row1

Common Values

ValueCountFrequency (%)
221
73.2%
139
 
12.9%
9934
 
11.3%
122
 
0.7%
111
 
0.3%
41
 
0.3%
21
 
0.3%
101
 
0.3%
71
 
0.3%
91
 
0.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
139
48.1%
9934
42.0%
122
 
2.5%
111
 
1.2%
41
 
1.2%
21
 
1.2%
101
 
1.2%
71
 
1.2%
91
 
1.2%

Most occurring characters

ValueCountFrequency (%)
969
58.0%
144
37.0%
23
 
2.5%
41
 
0.8%
01
 
0.8%
71
 
0.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number119
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
969
58.0%
144
37.0%
23
 
2.5%
41
 
0.8%
01
 
0.8%
71
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
Common119
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
969
58.0%
144
37.0%
23
 
2.5%
41
 
0.8%
01
 
0.8%
71
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII119
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
969
58.0%
144
37.0%
23
 
2.5%
41
 
0.8%
01
 
0.8%
71
 
0.8%

DSTRAESQUE
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct24
Distinct (%)7.9%
Missing0
Missing (%)0.0%
Memory size18.7 KiB
269 
ARTESUNATO+MEFLOQUINA
 
6
ARTESUNATO INJETAVEL
 
3
ARTEJUNATO INJETAVEL
 
2
ARTESUNATO + MEFLOQUINA
 
2
Other values (19)
 
20

Length

Max length30
Median length0
Mean length2.281456954
Min length0

Characters and Unicode

Total characters689
Distinct characters28
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique18 ?
Unique (%)6.0%

Sample

1st rowARTESUNATO INJETAVEL
2nd row
3rd row
4th row
5th row

Common Values

ValueCountFrequency (%)
269
89.1%
ARTESUNATO+MEFLOQUINA6
 
2.0%
ARTESUNATO INJETAVEL3
 
1.0%
ARTEJUNATO INJETAVEL2
 
0.7%
ARTESUNATO + MEFLOQUINA2
 
0.7%
ARTESUNATO+ MEFLOQUINA2
 
0.7%
ARTESUNATO+MEFLOQUINA INFANTIL1
 
0.3%
ARTESUNATO+CLINDAMICINA1
 
0.3%
ARTESUNATO 50MG (AP)1
 
0.3%
ARTESUNATO+MELOQUINA1
 
0.3%
Other values (14)14
 
4.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
artesunato13
21.3%
artesunato+mefloquina7
 
11.5%
mefloquina7
 
11.5%
injetavel5
 
8.2%
3
 
4.9%
artejunato2
 
3.3%
e2
 
3.3%
primafuina1
 
1.6%
ap1
 
1.6%
31
 
1.6%
Other values (19)19
31.1%

Most occurring characters

ValueCountFrequency (%)
A98
14.2%
N66
9.6%
T64
9.3%
E63
9.1%
O55
 
8.0%
U52
 
7.5%
I43
 
6.2%
R32
 
4.6%
S32
 
4.6%
L30
 
4.4%
Other values (18)154
22.4%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter632
91.7%
Space Separator28
 
4.1%
Math Symbol20
 
2.9%
Decimal Number6
 
0.9%
Other Punctuation1
 
0.1%
Open Punctuation1
 
0.1%
Close Punctuation1
 
0.1%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
A98
15.5%
N66
10.4%
T64
10.1%
E63
10.0%
O55
8.7%
U52
8.2%
I43
6.8%
R32
 
5.1%
S32
 
5.1%
L30
 
4.7%
Other values (9)97
15.3%
Decimal Number
ValueCountFrequency (%)
32
33.3%
02
33.3%
91
16.7%
51
16.7%
Space Separator
ValueCountFrequency (%)
28
100.0%
Math Symbol
ValueCountFrequency (%)
+20
100.0%
Other Punctuation
ValueCountFrequency (%)
/1
100.0%
Open Punctuation
ValueCountFrequency (%)
(1
100.0%
Close Punctuation
ValueCountFrequency (%)
)1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin632
91.7%
Common57
 
8.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
A98
15.5%
N66
10.4%
T64
10.1%
E63
10.0%
O55
8.7%
U52
8.2%
I43
6.8%
R32
 
5.1%
S32
 
5.1%
L30
 
4.7%
Other values (9)97
15.3%
Common
ValueCountFrequency (%)
28
49.1%
+20
35.1%
32
 
3.5%
02
 
3.5%
/1
 
1.8%
91
 
1.8%
51
 
1.8%
(1
 
1.8%
)1
 
1.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII689
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
A98
14.2%
N66
9.6%
T64
9.3%
E63
9.1%
O55
 
8.0%
U52
 
7.5%
I43
 
6.2%
R32
 
4.6%
S32
 
4.6%
L30
 
4.4%
Other values (18)154
22.4%

DTRATA
Categorical

HIGH CARDINALITY
HIGH CORRELATION
HIGH CORRELATION

Distinct74
Distinct (%)24.5%
Missing0
Missing (%)0.0%
Memory size18.6 KiB
None
216 
2013-06-15
 
3
2013-10-08
 
2
2013-01-18
 
2
2013-09-19
 
2
Other values (69)
77 

Length

Max length10
Median length4
Mean length5.708609272
Min length4

Characters and Unicode

Total characters1724
Distinct characters15
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique61 ?
Unique (%)20.2%

Sample

1st row2013-01-02
2nd rowNone
3rd rowNone
4th rowNone
5th row2013-01-04

Common Values

ValueCountFrequency (%)
None216
71.5%
2013-06-153
 
1.0%
2013-10-082
 
0.7%
2013-01-182
 
0.7%
2013-09-192
 
0.7%
2013-12-212
 
0.7%
2013-07-182
 
0.7%
2013-12-062
 
0.7%
2013-02-082
 
0.7%
2013-01-082
 
0.7%
Other values (64)67
 
22.2%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
none216
71.5%
2013-06-153
 
1.0%
2013-02-082
 
0.7%
2013-10-082
 
0.7%
2013-03-142
 
0.7%
2013-01-182
 
0.7%
2013-09-192
 
0.7%
2013-12-212
 
0.7%
2013-07-182
 
0.7%
2013-12-062
 
0.7%
Other values (64)67
 
22.2%

Most occurring characters

ValueCountFrequency (%)
N216
12.5%
o216
12.5%
n216
12.5%
e216
12.5%
0190
11.0%
-172
10.0%
1168
9.7%
2129
7.5%
3104
6.0%
823
 
1.3%
Other values (5)74
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number688
39.9%
Lowercase Letter648
37.6%
Uppercase Letter216
 
12.5%
Dash Punctuation172
 
10.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0190
27.6%
1168
24.4%
2129
18.8%
3104
15.1%
823
 
3.3%
419
 
2.8%
719
 
2.8%
613
 
1.9%
912
 
1.7%
511
 
1.6%
Lowercase Letter
ValueCountFrequency (%)
o216
33.3%
n216
33.3%
e216
33.3%
Dash Punctuation
ValueCountFrequency (%)
-172
100.0%
Uppercase Letter
ValueCountFrequency (%)
N216
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin864
50.1%
Common860
49.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0190
22.1%
-172
20.0%
1168
19.5%
2129
15.0%
3104
12.1%
823
 
2.7%
419
 
2.2%
719
 
2.2%
613
 
1.5%
912
 
1.4%
Latin
ValueCountFrequency (%)
N216
25.0%
o216
25.0%
n216
25.0%
e216
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII1724
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N216
12.5%
o216
12.5%
n216
12.5%
e216
12.5%
0190
11.0%
-172
10.0%
1168
9.7%
2129
7.5%
3104
6.0%
823
 
1.3%
Other values (5)74
 
4.3%

DT_ENCERRA
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing302
Missing (%)100.0%
Memory size2.5 KiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
02B542013-01-0220130120133333045527083532012-12-212012511960-12-184052M6909333304551NaT521110110212312013-01-022100000.0599ARTESUNATO INJETAVEL2013-01-02NaT
12B542013-01-0220130120133333045522883382013-01-012013011961-09-094051F5909333300451NaT2991202013-01-021NaNNoneNaT
22B542013-01-0320130120133333045567534692013-01-032013011976-10-144036F5106333303301NaT2102102013-01-031NaNNoneNaT
32B542013-01-0320130120133333045522883382012-12-312013011985-01-064027M6109333304551NaT2102102013-01-031NaNNoneNaT
42B542013-01-0420130120133333045530012022012-12-272012521965-03-064047M6108333304551NaT262105110212RO11100202013-01-044310.0312013-01-04NaT
52B542013-01-0620130220133333045522704712013-01-062013021982-05-074030F5108333304551NaT110212312013-01-062280.0299ARTESUNATO+MEFLOQUINA2013-01-06NaT
62B542013-01-0720130220133333045527083532012-12-272012521936-07-024076M6409333301701NaT110212PA1150010SNTO2013-01-07412120.0512013-01-07NaT
72B542013-01-0820130220133333045564878152013-01-072013021973-01-224039F5107333304551NaT2102102013-01-081NaNNoneNaT
82B542013-01-0820130220133333045564878152013-01-032013011995-09-044017M6105333304551NaT99999111021231ANGO2013-01-082100000.0599ARTEJUNATO INJETAVEL2013-01-08NaT
92B542013-01-0820130220133333045564878152013-01-032013011999-05-194013F5105333304551NaT99999111021231LUAN2013-01-082100000.0599ARTEJUNATO INJETAVEL2013-01-08NaT

Last rows

TP_NOTID_AGRAVODT_NOTIFICSEM_NOTNU_ANOSG_UF_NOTID_MUNICIPID_REGIONAID_UNIDADEDT_SIN_PRISEM_PRIDT_NASCNU_IDADE_NCS_SEXOCS_GESTANTCS_RACACS_ESCOL_NSG_UFID_MN_RESIID_RG_RESIID_PAISDT_INVESTID_OCUPA_NCLASSI_FINAT_ATIVIDAAT_LAMINAAT_SINTOMATPAUTOCTOCOUFINFCOPAISINFCOMUNINFLOC_INFDEXAMERESULTPMMPCRUZTRA_ESQUEMDSTRAESQUEDTRATADT_ENCERRA
2922B542013-12-1720135120133333045522883382013-12-142013501983-06-244030F5408333304551NaT2991102013-12-171NaNNoneNaT
2932B542013-12-1820135120133333045554763212013-12-152013511963-03-064050F5108333304551NaT2112102013-12-181NaNNoneNaT
2942B542013-12-1920135120133333045554763212013-12-192013511954-06-274059F51333304551NaT2992102013-12-191NaNNoneNaT
2952B542013-12-2020135120133333045554763212013-12-202013511983-07-284030M6108333304551NaT2102102013-12-201NaNNoneNaT
2962B542013-12-2120135120133333045554763212013-12-172013512012-02-104001M610333304551NaT11021302013-12-212210.022013-12-21NaT
2972B542013-12-2120135120133333045554763212013-12-172013511977-11-024036F12333304551NaT99999211021302013-12-212200.0272013-12-21NaT
2982B542013-12-2320135220133333045522883382013-12-212013511965-07-264047M6406333304901NaT2992102013-12-231NaNNoneNaT
2992B542013-12-2820135220133333045554763212013-12-262013521969-05-124044M69333303501NaT110212312013-12-282380.0399ARTESUNATO+MELOQUINA2013-12-28NaT
3002B542013-12-3020140120133333045530059922013-12-302014011965-01-204048F51333304551NaT2112102013-12-301NaNNoneNaT
3012B542013-12-3120140120133333045522883382013-12-262013521949-10-114064M6108333303301NaT2101102013-12-311NaNNoneNaT